Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation
نویسنده
چکیده
We investigate the problem of domain adaptation for parallel data in Statistical Machine Translation (SMT). While techniques for domain adaptation of monolingual data can be borrowed for parallel data, we explore conceptual differences between translation model and language model domain adaptation and their effect on performance, such as the fact that translation models typically consist of several features that have different characteristics and can be optimized separately. We also explore adapting multiple (4–10) data sets with no a priori distinction between in-domain and out-of-domain data except for an in-domain development set.
منابع مشابه
A Tunable Language Model for Statistical Machine Translation
A novel variation of modified KNESER-NEY model using monomial discounting is presented and integrated into the MOSES statistical machine translation toolkit. The language model is trained on a large training set as usual, but its new discount parameters are tuned to the small development set. An in-domain and cross-domain evaluation of the language model is performed based on perplexity, in whi...
متن کاملFast Gated Neural Domain Adaptation: Language Model as a Case Study
Neural network training has been shown to be advantageous in many natural language processing applications, such as language modelling or machine translation. In this paper, we describe in detail a novel domain adaptation mechanism in neural network training. Instead of learning and adapting the neural network on millions of training sentences – which can be very timeconsuming or even infeasibl...
متن کاملLanguage Model Adaptation for Statistical Machine Translation Based on Information Retrieval
Language modeling is an important part for both speech recognition and machine translation systems. Adaptation has been successfully applied to language models for speech recognition. In this paper we present experiments concerning language model adaptation for statistical machine translation. We develop a method to adapt language models using information retrieval methods. The adapted language...
متن کاملExtracting and Selecting Relevant Corpora for Domain Adaptation in MT
The paper presents scheme for doing Domain Adaptation for multiple domains simultaneously. The proposed method segments a large corpus into various parts using self-organizing maps (SOMs). After a SOM is drawn over the documents, an agglomerative clustering algorithm determines how many clusters the text collection comprised. This means that the clustering process is unsupervised, although choi...
متن کاملCombining translation and language model scoring for domain-specific data filtering
The increasing popularity of statistical machine translation (SMT) systems is introducing new domains of translation that need to be tackled. As many resources are already available, domain adaptation methods can be applied to utilize these recourses in the most beneficial way for the new domain. We explore adaptation via filtering, using the crossentropy scores to discard irrelevant sentences....
متن کامل